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Abstract 

In the following article we develop a particle filter for approximating Feynman-Kac models with indicator 

potentials. Examples of such models include approximate Bayesian computation (ABC) posteriors associated 

^y-N with hidden Markov models (HMMs) or rare-event problems. Such models require the use of advanced particle 

^_^ filter or Markov chain Monte Carlo (MCMC) algorithms e.g. Jasra et al. (2012), to perform estimation. One of 

^^ the drawbacks of existing particle filters, is that they may 'collapse', in that the algorithm may terminate early, 

fSJ due to the indicator potentials. In this article, using a special case of the locally adaptive particle filter in Lee et 

< I al. (2013), which is closely related to Le Gland & Oudjane (2004), we use an algorithm which can deal with this 

C^ latter problem, whilst introducing a random cost per-time step. This algorithm is investigated from a theoretical 

^I perspective and several results are given which help to validate the algorithms and to provide guidelines for their 

^ ^ implementation. In addition, we show how this algorithm can be used within MCMC, using particle MCMC 

^H (Andrieu et al. 2010). Numerical examples are presented for ABC approximations of HMMs. 

C^ Key Words: Particle Filters, Markov Chain Monte Carlo, Feynman-Kac Formulae. 

O 1 Introduction 

Let {{£n^<^n)}n>i be a sequence of measurable spaces, {Gn{x) = IlB^(^)}n>i5 {x^Bn) G E^ x ^^, B^ C E^, be a 
sequence of indicator potentials and {M^ : E^-i x ^^ ^ [0, l]}^>i, with xq G Eq a fixed point, be a sequence of 
, ^, Markov kernels. Then for the collection of bounded and measurable functions (f G ^^(E^) the n— time Feynman-Kac 
marginal is: 

If) assuming that jnif) = ^xolYlpZi Gp{Xp)] is well-defined, where Ea;o[-] is the expectation w.r.t. the law of an 

■<-H inhomogeneous Markov chain with transition kernels {M„}„>i. Such models appear routinely in the statistics and 

^^ applied probability literature including: 



a 



o 

CO 



• 



ABC approximations (as in, e.g., Del Moral et al. (2012)) 

ABC approximations of HMMs (Dean et al. 2010;Jasra et al. 2012) 

Rare-Events problems (as in, e.g., Cerou et al. (2012)) 



k^ In order to perform estimation for such models, one often has to resort to numerical methods such as particle filters 

^ or MCMC; see the aforementioned references. 

C^ The basic particle filter, at time n and given a collection of samples A^ > 1 with non-zero potential on E^_^, will 

generate samples on E^ using the Markov kernels {M^}^>i and then sample with replacement amongst {a;^}i<i<Ar 
according to the normalized weights Gn{x\^)/ X].=i Gnix^^). The key issue with this basic particle filter is that, at 
any given time, there is no guarantee that any sample x^ lies in B^, and in some challenging scenarios, the algorithm 
can 'die-out' (or collapse), that is, that all of the samples have zero potentials. From an inference perspective, this is 
clearly an undesirable property and can lead to some poor performances; for example, many particle filters display 
time-uniform convergence properties, which has yet to be shown for this class of particle filters. For some classes of 
examples, e.g. Cerou et al. (2012) or Del Moral et al. (2012), there are some adaptive techniques which can reduce 
the possibility of the algorithm collapsing, but these are not always guaranteed to work in practice. In this article 
we develop a particle filter. This algorithm uses the same sampling mechanism, but the samples are generated until 
there is a prespecified number that are alive. This removes the possibility that the algorithm can collapse, but 
introduces a random cost per time- step. The algorithm turns out to be an important special case of the work in 
Lee et al. (2013) and is closely related to Le Gland & Oudjane (2004). 

The particle filter is analyzed from a theoretical perspective. In particular, under assumptions, we establish the 
following results: 



1. Time uniform Lp bounds for the particle filter estimates of r]n{^) 

2. A central limit theorem (CLT) for suitably normalized and centered particle filter estimates of r]n{^) 

3. An unbiased property of the particle filter estimates of 7n(v^) 

4. The relative variance of the particle filter estimates of 7n(^), assuming N = 0{n), is shown to grow linearly 
in n. 

Whilst all of these results are classical in the literature on particle filters (Cerou et al. 2011; Del Moral 2004), the 
proof in this new context requires some modifications. In the main, these technical adjustments are associated 
to Lp— bounds and CLTs for sums of random variables with a random number of summands (in the context of 
1.-2.). The technical results in 1.-2. not only verify the correctness of the new algorithm, but suggest a substantial 
improvement over the standard particle filter, at the cost of increased computational time. The results in 3.-4. are 
of particular interest when using the new particle filter within MCMC methodology (a particle MCMC (PMCMC) 
algorithm, (Andrieu et al. 2010)). There are variety of applications of such PMCMC algorithms, for example, when 
performing static parameter estimation for ABC approximations of HMMs. The results in 3.-4. not only allow 
one to construct new PMCMC algorithms, but also provide theoretical guidelines for their implementation. It is 
remarked that some of these results can also be found in Le Gland & Oudjane (2006) (with regards to 1. 2.), except 
for a different estimate; this is a critical difference between the work in this article and that in Le Gland & Oudjane 
(2006). In particular, as mentioned in result 3. our estimates of 7n(^), and in particular 7n(l) are unbiased and 
this allows one to develop principled MCMC methodology. We also note that, in Le Gland & Oudjane (2006) the 
authors do not give a time- uniform bound. 

The structure of this article is as follows. In Section 2 we provide a motivating example, ABC approximations of 
HMMs, for the construction of the particle filter, as well as the new particle filter itself. In Section 3 our theoretical 
results are provided along with some interpretation of their meaning. In Section 4 we implement the new particle 
filter for the motivating example and then develop a basic PMCMC algorithm using the guidelines in Section 3 for 
static parameter estimation associated to ABC approximations of HMMs. In Section 5 the article is concluded, 
with some discussion of future work. The appendix contains technical results for the theory in Section 3 and is split 
into three sections. 

2 Motivating Example and Algorithm 

2.1 Motivating Example 

We are given a HMM with observations {Yn}n>i, ^n ^ V C R^y, hidden states {Zn}n>o, ^n ^ X C R^-, Zq given. 
We assume: 

¥{Yn e A\{Zn}n>o) = I 9e{y\zn)dy n > 1 

JA 

and 

¥{Zn e A\{Zn}n>o) = I fe{z\zn-i)dy n > 1 

JA 

with 6> G 9 a static parameter and dy Lebesgue measure. 

We assume go{y\xn) is unknown (even up to an unbiased estimate), but one can sample from the associated 
distribution. In this scenario, one cannot apply a standard particle filter (or many other numerical approximation 
schemes). Dean et al. (2010) and Jasra et al. (2012) introduce the following ABC approximation of the joint 
smoothing density, for e > 0: 

^ . I , X Uk=i9e{yk\zk)fe{zk\xk-i) . . 

TTeyzi-.Tiiyi-.n) = 7 — ffn —^ — \ — ^y7 — \ ^ (Ij 

Jx- iik=i 9e\yk\zk)fe{zk\zk-i)dzi:n 

where 

and B^{yk) is the open ball centered at yk with radius e. 

We let be fixed and omit it from our notations; it is reintroduced later on. We introduce a Feynman-Kac 
representation of the ABC approximation described above. Let E^ = E = X x Y and define Gn : E ^ {0, 1}: 

Gn{x) =IIxxB,(y^)(^)- 



Now introduce Markov kernels {Mn}n>i, M^ : E x B(X x Y) ^ [0, 1] (B(-) are the Borel sets), with 

Mn{x,dx') = f{z'\z)g(u'\z')du'dz' 
with X = (z, u). Then the ABC predictor is for n > 1: 



where ip G S5(E) and 



-in{^) =% 



xo 






Vn{^) := -7TY, (2) 

7n(lj 



^ n— 1 

/ IT Gp{Xp)Mp{Xp-i,dXp)Mn{Xn-l,dXn)(p{Xn). (3) 



This provides a concrete example of the Feynman-Kac model in Section 1. In light of (2), we henceforth refer 
to 7n(l) as the normalizing constant. This quantity is of fundamental importance in a wide variety of statistical 
applications, notably in static parameter estimation, as it is equivalent to the marginal likelihood of the observed 
data Yi, . . . , Y^-i in contexts such as the ABC approximation presented above, as can be determined from (3). 



2.2 Old Filter 

Now define, for n > 2: 



$„(.„-0(^)-'"-^^''"-^'^"^^^^ 



r]n-l{Gn-l) 

The standard particle filter works by sampling x}, . . . , x^^ i.i.d from Mi(xo, •) and setting 

N 

N ■ 



1 ^ 



at times n > 2 sampling xj^, . . . ,x^ from ^ri{Vn-i)i')^ assuming that the system has not died out. 

2.3 Alive Filter 

We now discuss an idea which will prevent the particle filter from dying out; see also Lee et al. (2013) and Le Gland 
& Oudjane (2006). Throughout we assume that M^(x, B^) is not known for each x,n; if this is known, then one 
can develop alternative algorithms. At time 1, we sample x\^ . . . ^x-^^ i.i.d. from Mi(xo, •), where 

n 

Ti = inf{n > TV : ^Gi{x{) > N}. 

i=l 

Then, define 



Now, at time 2 sample ^2, . . . ,^2^, conditionally i.i.d. from ^2{Vi^){')^ where 

n 

T2 = inf{n > N : ^G2(4) > N}. 

This is continued until needed (i.e. with an obvious definition of T^^T^ etc). The idea here is that, at every time 
step, we retain A^ — 1 particles with non-zero weight, so that the algorithm never dies out, but with the additional 
issue that the computational cost per time-step is a random variable. The procedure is described in Algorithm 1. 
We note that the approach in Le Gland & Oudjane (2004) retains N alive particles, i.e. it differs only in step 
2(a) of Algorithm 1 by sampling instead ap_i uniformly on {/c G {1, . . . ,Tp_i} : Gp-i(x^_i) = 1}. This seemingly 
innocuous difference is, however, crucial to the unbiasedness results we develop in the sequel. 



Algorithm 1 Alive Particle Filter 



1. At time 1. For j = 1, 2, . . . until j =: Ti is reached such that Gi{x{) = 1 and ^l^i Gi{x\) = N: 

• Sample x{ from Mi(xo, •). 

2. At time 1 < p < n. For j = 1, 2, . . . until j =: Tp is reached such that Gp{x^p) = 1 and J^^^i Gp{Xp) = N: 

(a) Sample ap_i uniformly from {/c G {1, . . . ,Tp_i — 1} : G'p_i(x^_]^) = 1}. 

(b) Sample x^ from Mp{x J^-^ , •). 

2.3.1 Some Remarks 

We remark that one can show (Del Moral, 2004) that for n > 2, the normalizing constant is given by 



7n(i) = n ^p(^p)- 



Thus, a natural estimate of the normalizing constant is 

n-l n-1 ^ _ 

We note that the estimates of r]n and 7^ are different from those considered in Le Gland & Oudjane (2006). This 
is a critical point as in Proposition 3.1 we show that this estimate of the normalizing constant is unbiased which is 
crucial for using this idea inside MCMC algorithms. In this direction, one uses the particle filter to help propose 
values and there is an accept/reject step; we discuss this approach in Section 4.2. Again, it is clearly undesirable 
in an MCMC proposal, if the particle filter will collapse and so, our approach will prove to be very useful in this 
context. 

Other than the fact that this filter will not die out, in the context of our motivating example, there is also a 
natural use of this idea. This is because, one can envisage the arrival of an outlier or unusual data; in such scenarios, 
the alive particle filter will assign (most likely) more computational effort for dealing with this issue, which is not 
something that the standard filter is designed to do. 

A final remark is as follows; in our example B^ = X x B^{yn) and so, as assumed in this article in general, 
Mn{x^ Bn) is not known for each x, n. This removes the possibility of changing measure to Q (in the formula for 
7n(-)), (with finite dimensional marginal Q^) 

call the Markov kernels in the product Mp. This is because the new potential at time n is exactly: M^(x, B^). 
However, one can simulate from M^ and use an unbiased estimate of Mn{x, B^) for each particle. That is, we 
obtain samples z^'^^ z^'^'^ from Mp{Xp_i^ •) using R samples (total) from Mp{Xp_i, •) and then we set Xp = z^'^'^ (say) 
with associated weight 1/{R — 1). This particular procedure would then have a fixed number of particles with no 
possibility of collapsing. Other than the algorithm being convoluted, some particles Xp_i could be such that E[i?] 
is prohibitively large, even though E[Tp] is not very large, which provides a reasonable argument against such a 
scheme. 

3 Theoretical Results 

We will now present some theoretical results for the particle filter in Section 2.3. This Section could be skipped 
with little loss of continuity in the article; although we do provide numerical simulations to verify the behaviour 
that is predicted by the forthcoming theoretical results. 



3.1 Assumptions and Notations 

Define the following sequence of Markov kernels, for n > 1: 

We will make use of the following assumptions: 

• (Ml): For each there exist a (5 G (0, 1) such that for each n > 1, (x, ^) G E^ 

Mn{x,-) >SMn{yr)- 
In addition there exist a < c < 1 such that for each n > 1, x G E^_i, M^(B^)(x) A M^(B^)(x) > c. 

• (G): For each n > 

• {Mm): There exist m > 1 and Pp G [1, oo) such that for any p > 1 and (x, y) G Bp 

Mp^p+rnix, dz) < P^'^^Mp^p^rnix, dz) Mp^p+rn = Mp+lMp+2 • • • Mp^m- 

The final two conditions are (Hm) in Cerou et al. (2011); we also use the notation Sp^ = Yl^^^~ Sq. These 
assumptions are exceptionally strong, but we remark that for the scenario of interest, weaker conditions have not 
been used in the literature. Note that in addition, in the context of ABC, the assumptions are essentially qualitative 
as verifying them is very difficult (even on compact state-spaces) as the likelihood density is typically intractable. 
However, we still expect the phenomena reported in the below results to hold in some practical situations. We 
again remark that our results are relevant for scenarios other than ABC. 

In order to understand some of the subsequent results, we introduce some notations. For a probability measure 
on E (denoted 7^(E)) /i G V{E) and bounded measurable real- valued function (denoted Bb{^)) (f G S5(E), we write 
/i(v^) := J^(p{x)^{dx). For ip G Bb{^), \\ip\\oc := sup^^E lv^(^)l- For ip G Bb{^), Osc{ip) = sup(^^^)^E2 \^{x) - (p{y)\. 
For fi^u e 7^(E), ll/i — vWty denotes the total variation distance. For a non-negative operator on ^^(E), R{x^ •), and 
(f G S5(E), R{(p){x) = J^(p{y)R{x^dy). Iterates of i? are written R^{xo^dXn) = f R{xo,dxi) x ••• x R{xn-i^dxn)' 
We will use the semi-group Qn{x, dy) = Gn-i{x)Mn{x^ dy) with n > 2, with the convention for p < n, Qp,n{^){xp) = 
J Qp-\-i{xp,dXp-^i) . . . Qn{xn-i,dxn)(p{xn)^ whcrc (f G Bbi^)] when p = n, Qn,n is the identity operator. We also 
adopt the notation for /i G 7^(E) ^p^ri{jii){(p) = <l>^ o • • • o <l>p^i(//)((p), (p G S5(E), p < n; when p = n^ <l>n,n(M) 
is the identity operator. E denotes expectation w.r.t. the stochastic process which generates the algorithm, with 
corresponding probability P. It is assumed that ^0 = 1. Note the important formula 7n(v^) = [Ilq^i Vq{Gq)]r]n{^)^ 
(f G Bb{^). JV{fi^a'^) denotes the normal distribution with mean fi and variance a^. Qeo{p) denotes a geometric 
random variable (with support {1,2,...}) with success probability p. 

3.2 Predictor 

In this Section, we consider the long-time behaviour of approximation of the prediction filter 



1 "• 



In particular, the study of this latter behaviour w.r.t. the algorithm in Section 2.2, is difficult due to the fact that 
the algorithm can collapse. For example, in a slightly different context, it is shown in Del Moral & Doucet (2004) 
that the algorithm which can die out has an upper-bound on the Lp— error which increases with n (and under strong 
hypotheses as in this article). In general, we do not know of any time-uniform result for algorithms which can die 
out. Below, we restrict p G [1,4] as this is all that is needed for a strong law of large numbers. The additional 
technical results associated to Theorem 3.1 can be found in Appendix A. 

Theorem 3.1. Assume (Mi). Then for any p G [1,4] there exist a Cp < oo such that for any n > 1, N > 2, 



Proof. Throughout Cp is a finite positive constant (that does not depend upon n) whose value may change from 
fine to hue. The proof fohows that of Theorem 7.4.4 of Del Moral (2004). We have, using eq. 7.24 of Del Moral 

(2004) 

WLOG we suppose Osc((/:?) < 1. Now for x G B^ we define the Markov kernel Pg^n(^r) •= Qq,n{xr)/Qq,n{'^){x) 
with associated Dobrushin coefficient /3{Pq^n) = sup(^^^)^g2 \\Pq^n{x^ •) — Pq^n{y^ -)\\tv and also set r^^n = 
^^P(a?,y)GB2 Qg,n(l)(^)/Qg,n(l)(^)- Then following the calculations of pp. 245-246 of Del Moral (2004), we have 

n 

where Q^^ni^) ^^ defined in pp. 246 of Del Moral (2004) and note that ||Q^n(v^)lloo ^ 1- Application of Corollary 
A.l gives: 

^ n 

The sum on the R.H.S. can be bounded uniformly in n by using standard arguments in Cerou et al. (2011) or Del 
Moral (2004) and are hence omitted. This concludes the proof. D 

3.3 Central Limit Theorem 

In this Section we consider the asymptotic properties of a suitably normalized and centered estimate of the predictor; 
a central limit theorem. We note that such a result is not a direct corollary of existing CLTs for particle filters in 
the literature (e.g. Del Moral (2004)). The additional technical results associated to Theorem 3.2 can be found in 
Appendix B. 

Here we write: 

AT 



Vn-\v) = ^Y.^(^n) 



the empirical measure of the first N — 1 sampled particles at time n. The convergence in probability (written ^p) 
weak convergence (written ^) results are as A/" ^ oo. 

Theorem 3.2. Assume (Mi). Then for any n > 1, cp ^ ^^(E^) we have: 



where, setting fn = f — Vn if) 

<^1(V) = Vn(Gn)Vni^'i) + [crl-l(Qn{^n))]/[Vn{Gn)Vn-liGn-l)] n>2 

or equivalently for any n > 1 



JqiGqf 

Proof Our proof proceeds via induction. For the case n = 1, by Lemma B.l we need only deal with the term 



ali^) = r?„(G„) J2 ?7A^V,{[QUVn) - %(Q,,„(^„)r). (4) 

I ln[^n) 



V{N-l)vi{G^M-^-Vi]{'fi). 
Then one need only apply the CLT for i.i.d. bounded random variables; this yields the result with 

a!{^) = m{G^)m{{^-m{^)f)- 

We assume the result for n — 1 and consider n. Let (pn = ^ — Vn{^) then we have 



VTn - nVn" - VuKv) = VTn " l[vl" " '^niVnSm'Pn) + V^n " l$n(r?„"r )](^„). (5) 



Now the first term on the R.H.S. of (5) can be written as 

+ V(^-lh„(G„)[r?r' - <^n{vl"--m^n)- (6) 

In addition, the second term on the R.H.S. of (5) can be written as 



Vr„-i$„(r,^ri^)](^„) = 



Tn-l Vn-l{Gn-l) 



r„_i - 1 y r]n{G„) 



^Tn-l-l^n{Vn-,')K'Pn) 



/?7„_i(G„_i) 



By Lemma B.l the first term on the R.H.S. of (6) converges in probability to zero. Also, by Lemma B.3, Theorem 
3.1 (which provides a strong law of large numbers) and the induction hypothesis {r]n-i{Qn{^n)) = 0), the first term 
on the R.H.S. of (7) converges in probability to zero. Thus, by a corollary to Slutsky's theorem, we can consider 
the weak convergence of 

V{N-l)Vn{Gn)[v::-' - ^n{71n--^')]{Vn) + Sj ^^^'fj')'^ y^f::^^l^n{vl-\')](^n) ■= A{N) + B{N). 

We now consider the characteristic function: 

E[exp{it{A{N) + B{N))}] = E[{E[exp{itA{N)}\^n-i] - e"^- (^^-^^'/^l exp{zt5(A/')}] + e"^- ('^-)^'/^E[exp{it5(A/')}] 

(8) 
where 

and ^n-i is the filtration generated by the particle system up-to time n — 1. We deal with the limit of the 
expectations on the R.H.S. of (8) independently. We wih show that E[exp{itA{N)}\^ri-i] - e-^-^^-^^'/^ wih 
converge in probability to zero, by using Theorem A3 of Douc & Moulines (2008). To that end, we note that for 
any cp G Bbi^n)^ we have that 

1 ^"^ 

will converge in probability to zero (for example, by controlling the second moment with the Marcinkiewicz-Zygmund 
(M-Z) inequality). Then by Theorem 3.1 as ^{rj^'^~i){(p) converges almost surely to r]n{(p) that r]^~^{(p) will converge 
in probability to rjni^f)- Using this result it follows easily that 



N-l 






converges in probability to a^((/?n). This verifies the first condition of Theorem A3 of Douc & Moulines (2008) 
(eq. (31) of that paper). As cp is bounded, it is straightforward to verify the second (Lindeberg-type) condition of 
Theorem 13 of Douc & Moulines (2008) (eq. (32) of that paper). Thus, application of this latter theorem shows 
that E[ex.-p {it A{N)}\^n_i] — e"^^*^"^"^^^ /^ converges in probability to zero. Then by the induction hypothesis 

B{N) => Ar(0, [al_,{Qn{^nW[Vn{Gn)Vn-l{Gn-l)])- (9) 

Thus 

{E[exp{itA{N)}\^n-i] - e-^'('^-)^'/^}exp{it5(7V)} ^p 0. 

Application of Theorem 25.12 of Billingsley (1995), shows that 

lim E[{E[exp{ztA(7V)}|^n-i] -e"^'^^-^^'/^}exp{zt5(Ar)}] =0. 

Thus, returning to (8), we consider e~^^^'^''^* ^'^E[ex.p{itB{N)}]. Noting (9) and again applying Theorem 25.12 of 
Billingsley (1995) we yield 

lim E[exp{itB{N)}] = e-t^'-^^^-^^-^^^/t^-^^-^^-^^^-^^^''/^ 



Thus, we have proved that 
where 

^n(V^) = Vn{Gn)Vn{^l) + [al_^{Qn{(Pn))]/[Vn{Gn)Vn-l{Gn-l)]^ 

The verification of the formula (4) for the asymptotic variance follows standard calculations and is omitted. D 

Remark 3.1. The formula for the asymptotic variance of the particle filter in Section 2.2, pp. 304 of Del Moral 
(2004) is 

E ?^V,{[QU^n) - V,{Q,A^nm. (10) 

q=l ^^^^^ 

Comparing to the asymptotic variance formula (4)^ this latter formula is certainly smaller if for each 1 < q < n 

V,{G,f < lln{Gn). (11) 

An alternative interpretation of (11) is if {Vn}n>i is a Markov chain with transition kernels {Mn}n>i then (11) is 

¥{Vq e Bq\vi G Bi, . . . , Vq_i G Bq_if < P(K G B,|^i G Bi, . . . , V^-l G B,_i). 

Whilst this can be difficult to verify in general, if the spaces are E^ = E^ n > 1^ potentials On = G for each n > 1 
and the Markov kernels are such that for each n > 1, x ^ E, M^(x, •) = iy{-) for some v G V(^, then the L.H.S. of 
(11) is v(GY cind the R.H.S. is fiG); so in this ideal scenario, the new algorithm asymptotically outperforms the 
old one with regards to variance. In general, one might believe that (4) is smaller that the formula (10)^ as its 
leading term (when q = n) is smaller and the condition (11) can certainly hold in many examples. 

Remark 3.2. One can also prove a CLT for the estimate of the filter; a direct corollary is, under (Mi), using 
Theorem 3.2, we have 



r?I"(G„) Vn{Gn) 

where 

n /^ ,2 

^n(V^) = E :w;^V:f77^^^([^Q'^(^^[^^ ~ Vn{Gn^)]) - VqiQqAGni^n " Vn{Gn^)]))?) • 

In comparison, the asymptotic variance of the estimate in Theorem 4 of Le Gland & Oudjane (2006) (which differs 
to the one in this article) has asymptotic variance 

S^li^) = E ^^^^^%^^.([Q.."(Gnbn - Vn{Gn^)]) " »?, (Q,.„(G„ [<p„ - UGnV)]))f)- 

Thus there is an asymptotic difference between the two procedures. In general, our approach is better with regards 
to asymptotic variance if 

l,{G,f ^ 7.(1)7.(G,) 



ln{GnYVn{Gn) ' ln{GnY 

or using the Markov chain interpretation in Remark 3.1: 

P(FiGBi,...,ViGB,_i,F, gB,) 



nKeB,|^iGBi,...,^,_i gB,_i) 



<¥{yi GBi,...,y,_iGB,_i). 



In general, one cannot say which is preferable, but in the case: the spaces are E^ = E^ n > 1^ potentials Gn = G for 
each n > 1 and the Markov kernels are such that for each n > 1, x e E, Mn{x^ •) = i'{-) for some v G V(^, both 
the L.H.S. and R.H.S. of the inequality are equal. 

3.4 Normalizing Constant 

Define the estimate of the normalizing constant: 

n-l 






The technical results used in this Section can be found in Appendix C. 



3.4.1 Unbiasedness 

Proposition 3.1. We have for any n>l,N>2 and ip G Bi)(En), that 

Proof. The proof uses the standard Martingale difference decomposition in Del Moral (2004), with some additional 
expectation properties that need to be proved. The case n = 1 follows from Lemma C.l, so we assume n > 2. We 
remark that for p G {2, . . . , n}: 

and hence that 

n 

Then by Lemma C.l, it follows that 

E[7j''(i)[<- - %{n'/-\mQpAv>)Wv-i] = 

and hence that 

E[7r"M-7nM]=0 

from which we easily conclude the result. D 

3.4.2 Non-Asymptotic Variance Theorem 

Below the term Xl^^x "" (g\ is as in Cerou et al. (2011). The expressions and interpretations for (5s /3s can 
be found in Section 3.1. In addition, (t?^"^)®^ is the [/—statistic that is formed from our empirical measure rj^"^ and 
(^T^)02 -g ^Yie corresponding F-statistic. In addition (7^-)®^(F) = ll-~i (}-? [vl^)®'^ [F) for F G S6(E^). 

Proposition 3.2. Assume (Hm)- Then for any n>2, N > 3 



— ^(^)^(^) 



Tr^fU .2 






t Vs{Gs) ' ^Wln(X) V J ^7V^ r^,(Gs) * 

Proof The result follows essentially from Cerou et al. (2011). To modify the proof to our set-up, we will prove 
that for F : E^ ^ R+ (where the expectation on the L.H.S. is w.r.t. the stochastic process that generates the SMC 
algorithm) 

mi^-fHF)] < (^)\[r?f Q,Qf Q,...Qf Q„(F)] (12) 

where for each n > 1, independently 



N-1 



with corresponding joint expectation E^ and Ci{F){x^y) = F(x,x), Co{F){x^y) = F{x^y). Once (12) is proved 
this gives a verification of Lemma 3.2, eq. (3.3) of Cerou et al. (2011), given this, the rest of the argument then 
follows Proposition 3.4 of Cerou et al. (2011) and Theorem 5.1 and Corollary 5.2 in Cerou et al. (2011) (note that 
the fact that we have an upper-bound with a = (as in Cerou et al. (2011)) does not modify the result). We will 
write expectations w.r.t. the probability space associated to the particle system enlarged with the (independent) 

{Cn}n>l as E^. 

Thus, we consider the proof of (12). We have 



Now 



E[(7r")®'(J^)l^n-i] = 7j"(l)'E[(r?^)«2(F)|^„_i]. 



E[(r,^)«2(F)|^„_i] = E[p^(r,^)e2(F) + ^,?J"(C(f)) 



^ 1 

^^n— 1 



< $„(r?^ri^)«^F) + ^$„(r?^ri^)(C(F)) 



where we have used (T^ — 2)/(Tn — 1) < 1, l/{Tn — 1) < l/(^ — 1) and Lemmas C.2 and C.l to obtain the second 
hne. Thus we have that 

Using the above inequahty, one can repeat the argument inductively to deduce (12). This completes the proof of 
the Proposition. D 

Remark 3.3. The significance of the result is simply that if 

^(m)^(m) 

s Vs{Be{ys)) 
then if N > en the relative variance will be constant in n. This will he useful for the PMCMC algorithm in Section 

4 Numerical Implementation 

4.1 ABC Filtering 

4.1.1 Linear Gaussian model 

To investigate the ahve particle filter, we consider the following linear Gaussian state space model (with all quantities 
one-dimensional) : 

Yn=2Zn^Wn, t>l 

where Vn ~ A/'(0,cr^), and independently Wn ~ A/'(0,cr^). Our objective is to fit an ABC approximation of this 
HMM; this is simply to investigate the algorithm constructed in this article. 

4.1.2 Set up 

Data are simulated from the (true) model for T = 5000 time steps and a^ G {0.1,1,5} and a^ G {0.1,1,5}. 

For n G {1,...,T}, if Pn ^ 5^7 where pt "^ * ^o,l] (the uniform distribution on [0,1]), we have Y^ = c, where 
c G {80, 90, . . . 140, 150}. Recall Be{y) = {u : \u — y\ < e} and we consider a fixed sequence of e which values belong 
to set {5, 10, 15}, i.e. e G {5, 10, 15}. We compare the alive particle filter to the approach in Jasra et al. (2012). 

The proposal dynamics are as described in Section 2.3. For the approach in Jasra et al. (2012), N = 2000 and 
we resample every time. For the alive particle filter, we used N = 1500 particles; this is to keep the computation 
time approximately equal. We also estimate the normalizing constant via the alive filter at each time step and 
compare it with 'exact' values obtained via the Kalman filter in the limiting case e = 0. To assess the performance 
in normalizing constant estimation, the relative variance is estimated via independent runs of the procedure. 

Our results are constructed as two parts. In the first part, we compare the performance of two particle filters 
under different scenarios. In the second part, we focus on examples where the approach in Jasra et al. (2012) 
collapses. All results were averaged over 50 runs. We note that, with regards to the results in this Section and the 
approach in Le Gland & Oudjane (2004); generally similar conclusions can be drawn with regards to comparison 
to the approach in Jasra et al. (2012). 

4.1.3 Part I 

In this part, the analyses of the alive particle filter were completed in approximately 115 seconds and approximately 
103 seconds were taken for the approach in Jasra et al. (2012) (which we just term the particle filter). Our results 
are shown in Figures 1-6. 

Figure 1 displays the log relative error for the alive filter to the particle filter. We present the time evolution of 
the Li log relative error between the 'exact' and estimated first moment. From our results, the mean log relative 
error for each panel is {0.06,0.04,0.07}. Figure 2 plots the absolute Li error of the alive particle filter error across 

10 



time. These results indicate, in the scenarios under study, that both filters are performing about the same time 
with regards to estimating the filter. This is unsurprising as both methods use essentially the same information, 
and the outlying values do not lead to a collapse of the particle filter. In addition, the behaviour in Figure 2, 
which is predicted in Theorem 3.1 under strong assumptions, appears to hold in a situation where the state-space 
is non-compact. 

In Figure 3, we show the time evolution of the log of the normalizing constant estimate for three approaches, 
i.e. Kalman filter (black '-' line), new ABC filter (red '--' line) and SMC method (blue '••' line). Figure 4 displays 
the (log) relative variance of the estimate of the normalizing constant via the alive particle filter, when using the 
Kalman filter as the ground truth. In Figure 3, there is unsurprisingly a bias in estimation of the normalizing 
constant, as the ABC approximation is not exact, i.e. e ^ 0. In Figure 4 the linear decay in variance proven in 
Proposition 3.2 is demonstrated (although under a log transformation). 

In Figure 5 and 6, we show the number of particles used at each time step (that is to achieve N alive particles) of 
the alive filter (Figure 5) and the number of alive particles for the standard particle filter (Figure 6). Both Figures 
illustrate the effect of outlying data, where the alive filter has to work 'harder' (i.e. assigns more computational 
effort), whereas the standard filter just loses particles. 
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Figure 1: Estimation error of the first moment for the hnear state space model. Each panel displays (Log) the ratio of Li error of the 
alive filter to old filter. 
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Figure 2: Estimation error of the first moment for the linear state space model using the alive particle filter (red ':*r' indicates the 
X-axis position of outlier) 



4.1.4 Part II 

In this part, we keep the initial conditions the same as in the previous Section but change the value of e. Instead 
of using e G {5,10,15}, we set smaller values to e, i.e. e G {3,6,12} (recall the smaller e, the closer the ABC 
approximation is to the true HMM (Theorem 1 of Jasra et al. (2012)). This change makes the standard particle 
filter collapse whereas the alive filter does not have this problem. All results were averaged over 50 runs and our 
results are shown in Figures 7-8. 
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Figure 3: Estimated normalizing constant for the linear state space model: Kalman filter (black '-'), alive filter (red '--') and old 
filter (blue '••'). Each panel displays the estimated normalizing constant across time. 
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Figure 4: (Log) Relative variance of normalizing constant of alive filter to Kalman filter for the linear state space model. Each panel 
displays the relative variance across time. 



In Figure 7, we present the true simulated hidden trajectory along with a plot of the estimated Xt given by the 
two particle filters across time when {(Jy^Gw) = (>/5, V^)- As shown in Figure 7, the alive filter can provide better 
estimation versus the old particle filter. Figure 8 displays the log relative error of the alive filter to old particle filter, 
which supports the previous point made, with regards to estimation of the hidden state. Based upon the results 
displayed, the alive filter can provide good estimation results under the same conditions when the old particle filter 
collapses. 

4.2 Particle MCMC 

We now utilize the results in Propositions 3.1-3.2. In particular. Proposition 3.1 allows us to construct an MCMC 
method for performing static parameter inference in the context of ABC approximations of HMMs. 
Recall Section 2.1. Our objective is to sample from the posterior density: 



7^{0\yi:n) 



Ixr^U]^=i9e{yk\zk)fe{zk\zk-i)dzi:n7r{e) 
Ixr^xQYlk=i9e(yk\zk)fe{zk\zk-i)dzi:n7T{0)dO 



(13) 



where ^^, fo is as (1) and 7r{0) is a prior probability density on O. Throughout the Section, we set TV > 2, e > 0, 
but in general omit dependencies on these quantities. In practice, one often seeks to sample from an associated 
probability on the extended state-space E"^ x 



Tf{0,Zi:n,Ui:n\yi:n) CX Y[^B,iyk){'^k)ge{Uk\Zk) fe{Zk\Zk-l)7r{0) . 



k=l 



It is then easily verified that for any fixed e 



7T{0\yi:n) = / 7^{0 , Zi:n, Ui:n\yi:n)dZi:ndUi:n- 
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Figure 5: Number of particles used for alive filter for the linear state space model. Each panel displays the number of particles across 
time. 
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Figure 6: Number of particles used for the particle filter of Jasra et al. (2012) for the linear state space model. Each panel displays 
the number of particles across time. 



A typical way to sample from 7r(6>, zi:^, '^^iml^im) is via the Metropolis-Hastings method, with proposing to move 
from {0^zi:n^ui:n) to {0^ ^ z[.^^ u[.^) via the probability density: 



q{0'\0)l[9e'iu',\z',)fe,{z',\z',_,) 



k=l 

such a proposal removes the need to evaluate go which is not available in this context. As is well known e.g. Andrieu 
et al. (2010), such procedures typically do not work very well and lead to slow mixing on the parameter space 6. This 
proposal can be greatly improved by running a particle-filter (the particle marginal Metropolis-Hastings (PMMH) 
algorithm) as in Andrieu et al. (2010); that is a Metropolis-Hastings move that will first move 6>, via q{0'\0) and 
then run the algorithm in Section 2.2 picking a whole path, /, x{.^ G E"^ the sample used with a probability 
proportional to Gn{xlj^). Remarkably, this procedure yields samples from (13) via an auxiliary probability density; 
the details can be found in Andrieu et al. (2010), but the apparently fundamental property is that the estimate of 
the normalizing constant is unbiased. Note also that the sample from the Markov chain {0^x{.^) also provides a 
sample from 7r(6>, zi:^, ^iiml^im)- 

As we have seen in the context of both theory and applications, it appears that the alive filter in Section 2.3 
out-performs the standard one, for a given computational complexity. In addition, as seen in Proposition 3.1, the 
estimate of the normalizing constant is unbiased. It is therefore a reasonable conjecture that one can construct 
a new PMMH algorithm, with the alive particle filter investigated previously in this article and that this might 
perform better (in some sense) than the standard PMMH just described. We remark that the justification of this 
new PMMH follows from the statements in Andrieu & Vihola (2012) (see also Andrieu & Roberts (2009)) and 
Proposition 3.1, but we provide details for completeness. 

4.2.1 New PMMH Kernel 

We will define an appropriate target probability to produce samples from (13), but we first give the algorithm: 
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Figure 7: (a) 'True' Zt and (b) estimated Zt across time for the linear state space model, where red ('-') indicates the alive particle 
filter and black '••' indicates the particle filter. 
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Figure 8: Estimation error of the first moment for the linear state space model. Each panel displays (Log) the ratio of Li error of 
alive filter to the particle filter. 



1. Sample 6>(0) from any absolutely continuous distribution. Then run the particle filter (with parameter value 
^(0)) in Section 2.3 up-to time n, storing 7^i(l) (now denoted 7^i 6)(o)(l))- Pick a trajectory x^.^(O), 
i G {1, . . . , T^(0) — 1}, with probability 

G„«(0)) 



E-=r"'G„(xuo)) 



Set i = 1. 



2. Propose 0'\0{i — 1) from a proposal with positive density on (write it g'(^'|^)).Then run the particle filter 
(with parameter value 6^) in Section 2.3 up-to time n, storing Jn-\-i e'i'^)- Pick a trajectory {x\.^y with 
probability 

Set e{i) = e', 7^+i,e(i)(l) = ^u+i,0'i^) with probability: 



lA 



7^+i,g>(l) 7rid')qiO(i-l)\0') 



Otherwise set 6{i) = 6{i — 1), 7^^^ e(i)i^) ~ ^n-\-i 6'fi-i) (-'-)' ^ = ^ + 1 ^ind return to the start of 2. 

For readers interested in the numerical implementation, they can skip to the next Section, noting that the samples 
will come from the posterior (13); this is now justified in the rest of the section. 



14 



We construct the following auxiliary target probability on the state-space: 



oo 



E = ex( U (e^^ X {Ti} X ( U (e^^x{1,...,Ti-1}^^x{T2}x---x |J (e^" x 

Ti=N T2=N Tr^=N 



{!,..., Tn-i - iV- X {T4 X {1, . . . , T, - 1}) • • • ))) . 



Whilst the state-space looks complicated it corresponds to the static parameter and all the variables (the states 
and the resampled indices) sampled by the alive particle filter up-to time-step n and then just the picking of one of 
the final paths. 

For n > 2 (we omit from our notation) define 

^n I ^V*^n' • • • ' "^n ) i ^n—l'> ' ' ' "> ^n—l'> n\^n—l i-'-n—lj • 
He (t^ r^- T )(^^-^)r\^^ Gr.-^ixy_-') ^ . al-1 . i. 

l^Tr^=N Z^a^'-'^'} e{l,...,Tr,-i-l} \N-l) JE^n '^Sr.K^n^ • ' ' i^n"" iTn) 11^=1 t^_i-i "^ — -Mn[X^_^ .dx^J 

where for n > 1 

Sn = {(iii,...,ii^,T,)GY-x{7V,iV + l,...}: ^ Ib,(,.)«) = iV - 1 H ix^ G 5e(^n)}. 

In addition, set 

,(,,^ T,,rr.\ hM,■■■,xl\T,){^NZl)Yf^^M^{xo,dx\) 

Wi( a(a;j;, . . . ,x-y^),Ti] := = ■. 

^ ' E^=iv(N:i)/E-iIs.(a;l,...,xrSTi)[nr=iMi(xo,d4) 

Then the PMMH algorithm just defined samples from the target 

n 
^(^,d(xi,...,X,),ai:,_l,/,Ti:,|^l:,) ^ Gn{x\)-i'^^^^Q(X)\\^ \d{x\, . . . ,X^^^^ 

fc=2 

^^(d(x\,...,xl-),T^^{d). 

where a/e = (aj., . . . , o^\ ^k = {xl^ . . . , x^^) and / G {l,...,Tn — 1}. Using Proposition 3.1, one can easily verify 
that for any fixed 6> G O 

7r(6>|^l:n) = / 7f(6>,d(xi,...,Xn),ai:n-l,A:,ti:n|^l:n). 

Je\q 

Note also that the samples {0^x{.^) from tt are marginally distributed according to 7r(6>, zi:n, i^iml^im)- The associ- 
ated ergodicity of the new PMMH algorithm follows the construction in Andrieu et al. (2010) and we omit details 
for brevity. 

4.2.2 Implementation on Real Data 

We consider the following state-space model, for n > 1 

Yn =£n/^exp(Zn) 
Zn =<j)Zn-l + CrVn 

where Sn ~ 5t(0,^i,^2,^3) (a stable distribution with location parameter 0, scale ^i, skewness parameter ^2 and 
stability parameter ^3) and K ^ A/'(0,c). We set = (/3,c,0), with priors c - X^(2, 1/100), - 2^(2,1/50) 
(X^(a, b) is an inverse Gamma distribution with mode b/{a + 1)) and /3 ~ A/'(0, 10). Note that the inverse Gamma 
distributions have infinite variance. 

We consider the daily (adjust closing) index of the S & P 500 index between 03/01/2011 - 14/02/2013 (533 
data points). Our data are the log-returns, that is, if I^ is the index value at time n, Y^ = log(/^//^_i). The data 
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are displayed in Figure 9. The stable distribution may help us to more realistically capture heavy tails prevalent in 
financial data, than perhaps a standard Gaussian. In most scenarios, the probability density function of a stable 
distribution is intractable, which suggests that an ABC approximation might be a sensitive way to approximate 
the true model. 



1,500 










(b) 




0.04 




li 1 









M 




Iniw 


Lf - 


0.04 




'1 




- 



200 400 

time 



600 



200 400 

time 



600 



Figure 9: S & P 500 (a) index data and (b) (Log) Daily return 



4.2.3 Algorithm setup 

We consider two scenarios to compare the standard PMMH algorithm and the new one developed above. In the 
first situation we set ^3 = 1.75 and in the second, ^3 = 1.2, with ^1 = ^2 = 1 in both situations. In the first case, we 
make e a suitable value as the data are not expected to jump off the same scale as the initial data. In the second, e 
is significantly reduced; this is to illustrate a point about the algorithm we introduce. Both algorithms are run for 
about the same computational time, such that the new PMMH algorithm has 20000 iterations. The parameters are 
initialized with draws from the priors. The proposal on /3 is a normal random walk and for (c, (j)) a gamma proposal 
centered at the current point with proposal variance scaled to obtain reasonable acceptance rates. We consider 
A/" G {10, 100, 1000} and for the new PMMH algorithm this value is lower to allow the same computational time. 

4.2.4 Results 

Our results are presented in Figures 10-13. In Figures 10-11 we can see the output in the case that ^3 = 1.75. 
For all cases, it appears that both algorithms perform very well; the acceptance rates were around 0.25 for each 
case. For the PMMH algorithm the average number of simulations of the data, per-iteration and data-point, 
were (1636,745,365) for N G {1000,100,10} respectively (recall we have modified N to make the computational 
time similar to the standard PMMH). For this scenario one would prefer the standard PMMH as the algorithmic 
performance is very good, with a removal of a random computation cost per iteration. 

In Figures 12-13 the output when ^ = 1.2 is displayed. In Figure 12 we can see that the standard PMMH 
algorithm performs very badly, barely moving across the parameter space, whereas the new PMMH algorithm has 
very reasonable performance (Figure 13). In this case, e is very small, and the standard SMC collapses very often, 
which leads to the undesirable performance displayed. We note that considerable effort was expended in trying to 
get the standard PMMH algorithm to work in this case, but we did not manage to do so (so we do not claim that 
the algorithm cannot be made to work). Note also that whilst these are just one run of the algorithms, we have 
seen this behaviour in many other cases and it is typical in these examples. The results here suggest that the new 
PMMH kernel might be preferred in difficult sampling scenarios, but in simple cases it does not seem to be required. 
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Figure 10: Trace plot of each parameter across iterations for a PMMH algorithm using the SMC algorithm in Section 2.2. Each : 
displays the samples with different A^. Here ^3 = 1.75. 
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Figure 11: Trace plot of each parameter across iterations for a PMMH algorithm using the SMC algorithm in Section 2.3. Each row 
displays the samples with different A^. Here ^3 = 1.75. 
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Figure 12: Trace plot of each parameter across iterations for a PMMH algorithm using the SMC algorithm in Section 2.2. Each row- 
displays the samples with different A^. Here ^3 = 1.2. 
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Figure 13: Trace plot of each parameter across iterations for a PMMH algorithm using the SMC algorithm in Section 2.3. Each row 
displays the samples with different N. Here ^3 = 1.2. 
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5 Summary 

In this article we have investigated the ahve particle filter; we developed and analyzed new particle estimates 
and derived new and principled MCMC algorithms. There are several extensions to the work in this article. 
Firstly, we have presented and analyzed the most standard particle filter; one can investigate more intricate filters 
commensurate with the current state of the art. Secondly, our theoretical results appear to hold under much weaker 
conditions than adopted (Section 4.1); one could extend the results in this direction. Thirdly, we have presented 
the most basic PMCMC algorithm; one can extend to particle Gibbs methods and beyond. Finally, one can also 
use the SMC theory in this article to interact with that of MCMC theory to investigate the performance of our 
PMCMC procedures. 
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A Technical Results for the Predictor 

The main result of this Section is below. Note we use the convention ^i{r]Q^){(p) = r]i{(p) and recall ^n is the 
filtration generated by the particle system up-to time n. 

Corollary A.l. Assume (Mi). Then for any p G [1,4] there exists a Cp < oo such that for any n>l,N>2 and 



Proof The case n = 1 follows directly from Lemma A.l and (Mi), so we consider n > 2. For n > 2, by Lemma C.l 
^[^n'^(^)l^n-i] = ^n{Vn-~i)i^)^ ^^ Conditional upon ^n-i we are in the setting of Lemma A.l. By (Mi) we can 
verify that $n(^n-l^)(^n) A ^ri{'nn-~i^)i^n) — ^ for some deterministic constant 1 > c > and hence application of 
Lemma A.l proves the result. D 

A.l Additional Technical Results 

In the following section let (F, ^) be a measurable space and X^, X^, ... be i.i.d. random variables on F associated 
to z/ G V{f). Let B e^ be such that 

u{B)Au{B'') >c 

for some 1 > c > 0. Let N > 2 and define 

n 

T := inf{n > 1 : ^Ib(X^) > X}. 

i=l 

Note that T is a negative Binomial random variable, with parameters X and success probability i^{B). We will 
consider some Lp— properties of 

with cp G ^^(F). Expectations are written as E. Note that one can follow the proof of Lemma C.l to show that 

T-l 



^E^(^^) 



iy{ip). 



E 

7- 1 

We then have the following technical results which are used in the main text. 

Lemma A.l. For any p G [1,4] there exist a Cp < oo such that for any, N >2, and (/? G S6(F) 
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1 ^' "^ '/^ 

-3^^[^(X0-K^)] 



< c,M\c 



^/N~^^ 
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Proof. Throughout Cp is a finite positive constant (that only depends upon p) whose value may change from line 
to line. WLOG we will assume that i^i^p) = 0. By the Minkowski inequality 



E 



1 ^"^ 



1/p 



< E 



E 



1 ^"^ 



N -1 iy{lB(p) T -N iy{lB-(p) 



T-1 v{B) T-1 v{B^) 



i/p 



TV - 1 v{Ib^) T -N v{Ib<^lp) 



T-1 u{B) T-1 u{B^) 



i/p 



(14) 



Lemma A. 2 will control the second term on the R.H.S. so we focus on the first term on the R.H.S.. 
We have 

TV - 1 v{Ib^) T-N v{lB<^if) "^ ^/^ 

y iO\ A. ] — ' 

T 



E 



1 ^-^ 



T-1 v{B) T-1 iy{B^) 



E 



T-1 



-J2lBiX')[p(X^) 



i=l 



iy{lB(p) 



T-1 



v{B) 



^i:'-(^-)[-(^-)-^^'^" 



i=l 



^) 



1/p 



Now conditioning upon T (so that N — 1 samples lie in B and T — N \\e in 5^ and we subtract the conditional 
expectations of the (conditionally) independent random variables) and applying an appropriately modified version 
of the M-Z inequality (e.g. Chapter 7 of Del Moral (2004)) we have 



E 



1 V,rY.^ N-lvilB^) T-Nv{Ib^^) 
_lZ^^^ ; j._^ ^/m ^ T-1 v(B^) 



iB-) 



1/p 



< 



C„E 



i— ^ ((T - i)(ii^iu(i + -y)Y + (T - N)mui + ^)r) 



ii/p 



where we are using the conditional distribution of X^ ^ . . . ^X^~^ ^ given T. Setting c = (||(/:)||oo(l + VCb)))^ ^ 
(lklloo(l + ^^)Y we have that 



E 



T 



1 v\,rYM ^-^^^^B^) I T-A^KgBc(/:^) 



.(^c) 



1/p 



< CpC^^^W. 



1 / T-N \ 

1W2 V ^ T - 1 J 



iVp 



(T - 1)^/ 



and then noting 1/(T — 1) < 1/(TV — 1), (1 + ^_^ ) < 2 and using i^(5) A i/(5^) > c for dealing with c we have 
proved that 



E 



_lZ^^^ ; ^_-^ ^/m ^ T-1 v(B^) 



< 






T-l^''^"' T-1 i/(5) T-1 i/(5c) 
Returning to (14) and using Lemma A. 2, along with above result allows us to complete the proof. 
Lemma A. 2. For any p G [1,4] there exist a Cp < oo such that for any, N >2, and ip G ^^(F) 
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N -1 v{Ib^) T -N vilB^Lp) 



T-1 v{B) T-1 v{B^) 



iy{ip) 



1/p 
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CpMc 
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Proof Throughout Cp is a finite positive constant (that only depends upon p) whose value may change from line 
to line. WLOG we will assume that ^{(p) = 0, so that ^{Ib^p) = —^'{Ib^^)' Then we have that 



C(Ar,T) := 



N -1 iy{lB(p) T-N iy{lB-(p) 
T-1 u{B) ^ T-1 u{B^) 

v{Ib^)N \N -1^ v{B) - Tv{B) 



v{B){l-v{B)) 



N{T -1) 



Now, via Minkowski 



Enc(ArT)nv. < KIb^)I^ u\f 1 A,_ZMg) 

^'^^ '^' ^ - v{B)(l-v{B))\ [\{T-iy) N 



1/p 



-E 



u{B) - 1 



N{T-1) 



1/p. 



(15) 
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As 1/(T — 1) < 1/(A^ — 1), we will focus on controlling the term 



E 



Tu{B) 



N 



i/p 



If y , y , . . . are independent Qeo{i'{B)) random variables then 



E 



Tv{B) 



N 



i/p 



= iy{B)E 



1 ^ 1 



\N 



v{B) 



i/p 



and applying an appropriately modified version of the M-Z inequality (e.g. Chapter 7 of Del Moral (2004)) we have 



E 



Tv{B) 



N 






7VL 



{l-u{B){l-u{B)^u{B)')) 



v{BY 



i/p 



where we have used the fourth central moment of a Geometric random variable and Cp is a constant that only 
depends upon p (that is independent of v or B). Returning to (15) and noting I'^B) A v{B^) > c, we have shown 
that 



E[\c{N,T)\p]^/p<cy\U 



N ,a 



'N-l'^ N' 



from which we can easily conclude. 



n 



B Technical Results for the CLT 

Recall convergence in probability is written ^p and N is going to oo. In addition, that the convention ^i{r]Q^){(p) = 
r]i{(f) is used and again recall ^n is the filtration generated by the particle system up-to time n. 

Lemma B.l. Assume (Mi). Then for any n > 1, cp ^ Bbi^n) '^^ have: 

Vt:^{vI" - $„(r?r"r )]M - v/(iV-l)r?„(G„)[,?r' - '^nivlT^W) ^p 0. 

Proof. We give the proof for any n > 2; the case n = 1 follows a similar proof with only notational modifications. 
Throughout the proof < C < oo is a deterministic constant independent of n and N whose value may change 
from line to line. Our proof follows a similar construction to that found in pp. 369 of Billingsley (1995). To that 
end, we have 



Vt:^{€" - *„(^:r/)](^) - v/(iV-l)r?„(G„)[r?^-^ - -^niVn-^^m = 



VTn - l[r?^ - '^nivlr.m^) - (T„ - 1)^ 



(7V-1) 



ivl" - ^n{Vn--,')]{^) + 



n - l)y ^I^I'?"" - *n(^r-"/)]M - V(iV-l)r?„(G„)[r?r' - *„(r?r"r)](^). 



(T, 



In Lemma B.2, we have shown that (16) converges in probability to zero; hence, we focus upon (17). 
To shorten the subsequent notations, we set 






(T„-l)[r?^-$„(C-r)]M 



Then let 1 > £ > be given, and consider: 
N -I 



>e. 



N ■ 



< 



Tn-l 



rin{Gn) 



>s-\N-l)\ + 



max 



11n{Gn) 

5^M-5r'M >£(iV- 1)1/2 



(16) 
(17) 
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< 



T.-1 



7V-1 



VniGn) 



>e^(7V-l) U2P( max \S'^{^)\ > e{N - 1)'/^ . 



Now for the latter probability, one can condition upon ^n-\ and apply Kolmogorov's inequality noting that the 
conditional variance of (^{x) — $n(^n-l^)(v^) ^^ deterministically upper-bounded by Osc((/9)^. Hence, we have that 



< 



7V-1 



>e, 



I N-1 

VniGn) 



T. - 1 



>6:''(7V-1) +2C£. 



VniGn) 

Noting Lemma B.3 and that we can make e arbitrarily small, the proof is completed. 
Lemma B.2. Assume (Mi). Then for any n > 1, cp ^ B^iEn) we have: 



D 



y^^:^[e - ^nivl-l')]i^) - iTn - l)y ^I^I^n^ " ^ nivl-\' )]i^) ^F 0. 

Proof. We give the proof for any n > 2; the case n = 1 follows a similar proof with only notational modifications. 
Throughout the proof < C < oo is a deterministic constant independent of n and TV whose value may change 
from line to line. We will prove that 



VT:^l[ril- - ^n{n'n-\')]{v) - {Tn " ^) J J^TT^ivl" " '^n{vl--,'W) 



will go to zero in Li. To that end, we rewrite this expression as 



A{N):=[vl--<^n{Vu--:)]{'fi) 
To simplify the subsequent notations, we define 

B{N) := 



.v'F^T 



VVn{Gn) 



{N-1) 



(T„ - l)r?„(G„) 



(N-l) 



(T„ - l)r?„(G„) 



Then a simple application of Holder's inequality gives 



E[|A(A^)|] <E 
We will show that: 



[v^"-'^u{vll-^m^) 
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2/3 



E[\B{N)\ 



311/3 



1. E 



[r?^"-$„(r?:"r)](^) 



T^-1 



/N~- 



^\/UGJ 



is independent of N. 
2. lim^^ooE[|5(7V)p]V3=0. 



3/2' 



-|2/3 



is upper-bounded by a finite deterministic constant C that 



This will conclude the proof. 

Proof of 1. We have, by another application of Holder, that 
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[vi--^niVn-l)]i^) 



Tn-l 



Application of Corollary A.l gives 
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2/3 
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Tn-l 

_VAr^ 
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2/3 
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Tn-l 



V VniGn) 



1/3 



(18) 
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Now turning to the expectation on the R.H.S. of the inequahty, we have 



E 



Using the fact that, via (Mi) 



^0^V^UGnj 



{N - 1)3/2 



E[T^ - iTl + 3T„ - 1] 



$„(r?^-)(B„) A $„(r?J"-)(B^) > c (19) 

for deterministic c and standard properties on raw moments of negative binomial random variables, it follows that 

Wn - 3^n + 3Tn - 1] < CN^. 
Thus, we can show that 

E 
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V^ 



VVn{Gn) 



1/3 



< CrjniGnY^'N 



l/2i\rl/2 



Returning to (18), we have shown that 
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[V^--^n{vl-l')m 



Tn 



.Vtv^T 



\fnJGn) 



3/2' 



2/3 



< Cr^n{GnY^^ 



which completes the proof of 1. 

Proof of 2. By Lemma B.3 and the continuous mapping theorem, we have that B{N) -^f> 0. Thus if we 
can show that for some ^ > 0, sup^>i E[|5(A^)p^^+'^^] < +oo this will allow us to conclude. For simplicity of 
calculation, we set S = 1/3. Then, using the fact that l/{Tn — 1) < 1/{N — 1) we have 



E[|5(A/')|'^] <E 



1 



Vn{Gn)){Tn-l) 



N-1 
On expanding the brackets and removing the negative terms, the expectation on the R.H.S. is upper-bounded by 

1 + ^|^E[T„ - 1] + ^|^E[T2 - 2T„ + 1]. 

Using the conditional negative binomial property of T„ this expression is equal to 



1+ 
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tH 



$„(r?„"r)(G„) 
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2N 



*„(<r/)(G„)= 



$„(r?r"r)(G„)2 $„(r?r"r)(G„) 



+1 



Applying (19) we easily show that this latter expression is, uniformly in A'', upper-bounded by a constant G. That is, 
we have shown that supjv>i E[|i?(7V)|3(i+'^)] < +00, which completes the proof of 2. This completes the proof. D 



Lemma B.3. Assume (Mi). Then for any n> 1, we have: 

Tn 1 

N rin{G„) 



->pO. 



Proof. We give the proof for any n > 2; the case n = 1 follows a similar proof with only notational modifications. In 
Theorem 3.1, we have proved that ^n{Vn-~i^)iv) converges almost surely to rin{(p) for ip G Bb{E). Thus we consider 

'T„ 1 x2l 
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(£n_ 



$„(r?^-i^)(G„) 



Now conditionally upon ^n-i, ^n is a negative binomial random variable with success probability ^ri{'nn-~i^)iGn)^ 
so writing Xi^n{N) as (conditionally) independent geometric random variables with the same success probability, 
we have 



E 



L^^ ^n{vt-l'){Gn)^ 



E 
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^ 1 
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Applying the conditional version of the M-Z inequality on the R.H.S. of the inequality, we have the upper-bound: 



C_ 



E 






recalling that ^n{'fln-i){^n) ^ c for some deterministic constant c we conclude that 



E 



'^ ^n(r?rr/)(Cn) 



< 



N' 



The proof is completed on recalling that ^n{Vn-i^)i^n) converges almost surely to r]n{Gn)- 

C Technical Results for the Normalizing Constant 

Lemma C.l. We have for any n > 1, N > 2 and cp G B5(E^)^ that 
where ^i{r]_1^){(p) = Mi{(p). 



D 



Proof. We have, for any n > 1, A/" > 2 that Tn\^n-i is a Negative Binomial random variable with parameters 

A^ — 1 and si 
Zacks (1980) 



A^ — 1 and success probability ^n{Vn-i)i^n) = ^n{Vn-i)i^n) and note that from Neuts & Zacks (1967) and 
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Now, 
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where we have used the fact that there are A^ — 1 particles that are 'alive' and T^ — N that will die and used the 
conditional distribution of the samples given T^. Now by (20), it follows then that 



E[r?I"M|#-„-i] = $„(r,^"-)MBj + $„(r?^ri^)MBs) = '^nivl'Ti^i^) 
which concludes the proof. 
Lemma C.2. We have for any n > 2, N > 3, cp e B5(E^); 

Proof We have: 
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The three terms on the R.H.S. arise due to the {N — 1){N — 2) different pairs of particles which land in B^ (21), 
the 2{N-l){Tn-N) pairs of different particles which land in B^ x B^ (22) and the (T^ - TV) (T^ - TV - 1) different 
pairs of particles which land in (B^)^ (23); the factors of <^n(^n-l^) ^irise from the conditional distributions of the 
particles given T^ (recalling that conditional on ^n-i, ^n is a negative binomial random variables parameters N 

Now for (21), we have from Neuts & Zacks (1967) and Zacks (1980) that 

{N -l){N-2) 



E 
so that (21) becomes 



{Tn-l){Tn-2) 



^^n— 1 



$„(r?:"r)«^(B;;). 



Recalling (20) and using the above result, (22) becomes 

Finally, noting that for any t ^ 1,2 l/(t — 2) = l/(t — 1) + l/[(t — l)(t — 2)], and thus using the above results that 



E 
it follows that (23) is equal to 
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Hence we have shown 

n{ril-r\^Wn-l] = ^n{rt-\T\lBl^) + 2^.(r^^r/)^^(lB.xB^V^) + ^n{vl-\T\hB^J^^) 
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